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Abstract: In the application of autoregressive models the order of the model 
is often estimated using either a sequence of likelihood ratio tests, a likeli- 
hood based information criterion, or a residual based test. The properties of 
such procedures has been discussed extensively under the assumption that the 
characteristic roots of the autoregression are stationary. While non-stationary 
situations have also been considered the results in the literature depend on 
conditions to the characteristic roots. It is here shown that these methods for 
lag length determination can be used regardless of the assumption to the char- 
acteristic roots and also in the presence of deterministic terms. The proofs are 
based on methods developed by C. Z. Wei in his joint work with T. L. Lai. 

1. Introduction 

Order determination for stationary autoregressive time series has been discussed 
extensively in the literature. The three prevailing methods are either to test re- 
dundance of the last lag using a likelihood based test, to estimate the lag length 
consistently using an information criteria, or to investigate the residuals of a fitted 
model with respect to autocorrelation. It is shown that these methods can be used 
regardless of any assumptions to the characteristic roots. This is important in ap- 
plications, as the question of lag length can be addressed without having to locate 
the characteristic roots. 

The statistical model is given by a p-dimensional time series X t of length K + T 
satisfying a Kth order vector autoregressive equation 

K 

(1.1) X t =5^i4 l X t _ l + /iA+et, t = l,...,T, 

i=i 

conditional on the initial values Xq, . . . , X\_k- The effective sample will remain 
X\ , . . . , Xt when discussing autoregressions with k < K to allow comparison of 
likelihood values. The component D t is a vector of deterministic terms such as a 
constant, a linear trend, or seasonal dummies. For the sake of defining a likeli- 
hood function it is initially assumed that the innovations, (e*), are independently, 
identically normal, N p (0, f2), distributed and independent of the initial values. 

The aim is to determine the largest non-trivial order for the time series, k$ say 
with < fcn < K, so Ak Q 7^ and Aj — for j > fcr> Three approaches are available 
of which the first is based on a likelihood ratio test for Ak = where 1 < k < K, 
The log likelihood ratio test statistic is 

LR (Ak = 0) = Tlogdetfi fc _i -Tlogdetf2 fc , 
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where Clk-j is the conditional maximum likelihood estimator based on the observa- 
tions Xi, . . . , Xt given the initial values, see (|3 . 2[) below. The statistic LR is proved 
to be asymptotically x 2 under the hypothesis fco < k, generalising results for the 
purely non-explosive case. Since the result does not depend on the characteristic 
roots, it can be used for lag length determination before locating the characteristic 
roots. 

The second approach is to estimate ko by the argument k that maximises a 
penalised likelihood, or equivalently, minimises an information criteria of the type 

JJT) 
T 



(1.2) %=logdet%+j%A j=0,...,K. 



In the literature there are several candidates for the penalty function /. Akaike 
has f(T) = 2p 2 ^>chwarz @ has f(T) = p 2 logT while Hannan and Quinn 
[Io| and Quinn [23] have f(T) = 2p 2 loglogT. For stationary processes with- 
out deterministic components it has been shown that the estimator k is weakly 
consistent if f(T) = o(T) and f(T) — > oo as T increases, while Hannan and 
Quinn show, for p = 1, that strong consistency is obtained if f(T) — o(T) and 
liminf'r^oo f(T)/ log log T > 2, while strong consistency cannot be obtained if 
limsupT^oQ f(T)/ log log T < 2. In other words the estimators of Hannan and 
Quinn and of Schwarz are consistent while Akaike's estimator is inconsistent. Some 
generalisations to non-explosive processes have been given by for instance Paulsen 
[201 ]. Potscher 21] and Tsay 24 1. Potscher also considered the purely explosive 



case but did not obtain a common feasible rate for f(T) for the explosive and the 
non-explosive case. In the following consistency is shown for a penalty function 
f(T) not depending on the characteristic roots, showing that the penalised likeli- 
hood approach also can be applied to lag length determination prior to locating the 
characteristic roots. 

A third approach is a residual based mis-specification test. This is implemented 
in particular in econometric computer packages. In a first step the residuals, it 
say, are computed from the model fll.ljl with k — 1 lags, say. In a second step an 
auxiliary regression is considered where it is regressed on lagged values as well as the 
regressors in equation (|1.1[) . It is argued that a test based on the squared multiple 
correlation arising from the auxiliary regression is asymptotically equivalent to the 
above mentioned likelihood ratio test statistic also in the general case. 

Like the work of Potscher [2l[ the proofs in this paper are based on the joint work 
of C. Z. Wei and T. L. Lai on the strong consistency of least squares estimators 
in autoregressions, see for instance Lai and Wei [HI]. As pointed out in Potscher's 
Remark 1 to his Theorem 3.3 these results are not quite strong enough to facilitate 
common feasible rates for the penalty function. Two important ingredients in the 
presented proofs are therefore an algebraic decomposition exploiting partitioned 
inversion along with a generalisation of Lai and Wei's work given by Nielsen [17| . 
Whereas the former paper is concerned with showing that the least squares estima- 
tor for the autoregressive estimator is consistent, the latter paper provides a more 
detailed discussion of the rate of consistency as well as it allows deterministic terms 
in the autoregression. 

The following notation is used throughout the paper: For a quadratic matrix a let 
tr(a) denote the trace and X(a) the set of eigenvalues, so that |A(a)| < 1 means that 
all eigenvalues have absolute value less than one. When a is also symmetric then 
Amin(a) and A max (a) denote the smallest and the largest eigenvalue respectively. 
The abbreviations a.s. and P are used for properties holding almost surely and in 
probability, respectively. 
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2. Results 

Before presenting the results the assumptions and notation is set up. Then the 
results follow for the three approaches. 

2.1. Assumptions and notation 

The asymptotic analysis is to a large extent based on results of Lai and Wei 
with appropriate modifications to the situation with deterministic terms in Nielsen 
[Tfl ]. Following that analysis the assumption to the innovations of independence 
and normality made above can be relaxed so that the sequence of innovations 
(e t ) is a martingale difference sequence with respect to an increasing sequence of 
(T-ficlds (J 7 t)i that is: the innovations X%-k, ■ ■ ■ , Xq are .^-measurable and et is 
■T-j-measurable with E(e t |J r t _i) a =' 0, which is assumed to satisfy 

(2.1) supE{(e' t St) x/2 \Tt-i} "< oo for some A > 4. 

t 

To establish an asymptotic theory for the LR-statistic it is assumed that 

(2.2) E(e t e' t \F t -i) "=n, 

where f2 is positive definite. For the asymptotic theory for the information criteria 
this can be relaxed to 

(2.3) liminfA min E {e t e' t \F t -i) >' 0. 

t — >oc 

The deterministic term D t is a vector of terms such as a constant, a linear 



trend, or periodic functions like seasonal dummies. Inspired by Johansen 13( the 
deterministic terms are required to satisfy the difference equation 

(2.4) D t = DD t _ 1) 

where D has characteristic roots on the complex unit circle. For example, 

r> = with D ° = \\ 

will generate a constant and a dummy for a biannual frequency. The deterministic 
term D t is assumed to have linearly independent coordinates. That is: 

(2.5) |A(D)| = 1, rank(Di,...,D dimD )=dimD. 
In the analysis it is convenient to introduce the companion form 

(?:) = (^)(5:!) + (o 

where X 4 _i = {X' t _ l7 X' t _ k+1 )' and 

B=<^ } , l= i p > , fj, = t/iD, et = is t . 



The process X t can be decomposed using a similarity transformation. Following 
Herstein ([HI], P- 308) there exists a regular, real matrix M that block-diagonalises 
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B so that AfBM^ 1 = diag(U, V, W) is a real block diagonal matrix where the 
eigenvalues of the diagonal blocks U,V,W satisfy |A(U)| < 1, |A(V)| = 1, and 
A(W)| > 1. Any of the blocks U, V,W can be empty matrices, so if for instance 
A(B)| < 1 then U = B and dim V = dim W = 0. The process X f can therefore be 
decomposed as 



(2.7) 



MX, 




U nu 
V [i v 
0W|i ff 



Finally, there exists a constant flu, see Nielsen ([17|, Lemma 2.1), so 




(2.8) 



U t = U t + JituDt 



where 



U t = VUt-i + eu,t- 



2.2. Likelihood ratio test statistics 



The likelihood ratio test statistic is known to be asymptotically \ 2 i n the stationary 
case where |A(B)| < 1 and D = 1, see Liitkepohl ([la]. Section 4.2.2). Here the result 
is shown to hold regardless of the assumptions to B and D. Thus, the likelihood 
ratio test can be used before locating the charateristic roots. 

Theorem 2.1. Suppose Assumptions \2.1\ . (jg.gl) . are satisfied and ko < k. 

Then LR(Ak = 0) is asymptotically x 2 (p 2 ). 

Since the likelihood ratio test statistic is based on partial correlations it follows 
from Theorem 12.11 that partial correlograms that are computed from partial cor- 
relograms can be used regardless of the location of the characteristic roots. Often 
correlograms are, however, based on the Yule- Walker estimators, which assume sta- 
tionarity. For non-stationary autoregressions that can lead to misleading inference. 
Nielsen [3] provides a more detailed discussion. 

Remark 2.2. The fourth order moment condition, A > 4, in Assumption (|2.ip is 
used twice in the proof. First, to ensure that the residuals from regressing e t on 
the explosive term W t -i do not depend asymptotically on Wt-i- As discussed in 
Remark 13.71 it suffices that A > 2 if either of the following conditions hold: 

(I,a) dimW = 0. 

(I,b) dimW > and e* independent, identically distributed. 

Secondly, to ensure that St£t-i has second moments when applying a Central Limit 
Theorem. As discussed in Remark l3.12[ it suffice that A > 2 if 

(II) the innovations et are independent. 

The test statistic considered above is for a hypothesis concerning a single lag. 
This can be generalised to a hypothesis concerning several lags, m say, where k + 
m - 1 < K. 

Theorem 2.3. Suppose Assumptions \2.1\ . §EM > ([Hp) are satisfied and kg < k. 
Then LR(Ak = ■ ■ ■ = Ak+ m -l = 0) is asymptotically x 2 (p 2 m)- 



2.3. Information criteria 

The next two results concern consistency of a lag length estimator arising from use 
of information criterions. The proof has two distinct parts. First, it is argued that 
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the lag length estimator fc is not under-estimating, and, secondly, that it is not 
over-estimating. The first part is the easy one to establish. This result holds for all 
of the penalty functions discussed in the introduction under weak conditions to the 
innovations. 

Theorem 2.4. Suppose Assumptions <\2.1\ . <\2.3\ . <\2.5\ are satisfied with A > 2 

„ a.s. 

only and f(T) — o(T). Then liminfT^oo k > fco. 

This result has previously been established in the univariate case without deter- 
ministic terms so p — dim X = 1 and dimD = by Potscher (1989, Theorem 3.3). 
For the purely explosive case |A(B)| > 1 his Theorem 3.2 shows the above result 
under the weaker condition /(T) — o(T 2 ). A version holding in probabilty has been 
shown for the non-explosive case |A(B)| < 1 and D = 1 by Paulsen [20( and Tsay 
3- 

Results showing that the lag length is not overestimating are harder to estab- 
lish. Various weak and strong results can be obtained depending on the number of 
conditions that are imposed. 

Theorem 2.5. Suppose Assumptions \2. 1\ . are satisfied. Then 

(i) If f(T) — > oo and Assumption ([2.2$ holds then P(fc < ko) — ► 1. 

(ii) If f(T)/logT oo and Assumption <\2.3^i holds then lim sup fc < fco. 

(iii) 7//(T)/{(loglogT) 1 / 2 (logT) 1 / 2 } -> oo, Assumption (fO|) holds, and the pa- 
rameters satisfy the condition (A) that V and D have no common eigenvalues 

„ a.s. 

then lim sup k < fco ■ 

T-+oo 

(iv) // f{T)/ loglogT — > oo, Assumption <\2.3\ holds, and either (_B)dimD = 

„ a.s. 

with V — 1 or (C) dim V = then lim sup fc < fco . 

(v) Suppose Assumption <\2.2l holds, and either (B) or (C) holds then 

(a) //liminf T ^ 00 (21oglogT)- 1 /(r) ">' p 2 then limsupfc °< fc . 

T^oc 

(b) //limsup T ^ 00 (21oglogT)- 1 /(r) °<' 1 then fc a ^ fc . 

By combining Theorems l2.4ll2~5l consistencv results can be obtained. For instance 

Theorem 12.41 in combination with Theorem I2.5f i) shows that fc — > fco if the penalty 
function satisfies f(T) — > oo and f(T) = o(T). This includes Hannan and Quinn's 
and Schwarz's penalty functions, but excludes that of Akaike as usually found. 
Likewise, Theorem 12.41 in combination with Theorem I2.5f ii) show that fc — fco if 
the penalty function satisfies /(T)/logT — > oo and f(T) = o(T). These results 
are the first to present conditions to the penalty function ensuring consistency that 
are not depending on the parameter B and D. This implies that the information 
criteria can be used before locating the charateristic roots. 

It remains an open problem, however, to establish strong consistency of the 
Schwarz and the Hannan-Quinn estimators for general values of V and D. Theorem 
12.41 combined with Theorem I2.5f iii) shows that the Schwarz estimator is strongly 
consistent when (A) holds so V and D have no common eigenvalues. Theorem 12.41 
combined with Theorem 12. 5f v) shows that the Hannan-Quinn estimator is strongly 
consistent when either (B) dimD = with V = 1 or (C) dimV = holds. This 
is the first strong consistency result for the Hannan-Quinn estimator in the non- 
stationary case. 
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Remark 2.6. In Theorem 12.51 the fourth order moment condition A > 4 in As- 
sumption (|2.1[) can be relaxed to A > 2 under certain condions to the parameters. 
Recall the conditions stated in Remark [2J2I which are 



(I,a) dimW = 0. 

(I,b) dimW > and St independent, identically distributed. 
(II) the innovations St are independent. 

As discussed in Remark 13.131 it holds: 

Result (i) can be relaxed if (II) holds along with either (I, a) or (I,b). 
Results (ii), (iii), (iv) can be relaxed if (I, a) holds. 
Result (v) cannot be relaxed with the present proof. 

A number of related results are available in the literature. 

The weak consistency results in (i) has been shown for the non-explosive case 



|A(B)| < 1 and D = 1 by Paulsen [2j| and Tsay [24 1. 

The (log log T) 1 / 2 (log T) 1 / 2 rate discussed in Theorem l2.5f iii) and Remark l2.6f iu") 
is an improvement over the logT rates discussed by for instance Potscher 21 1 
and Wei (2^]. These authors discuss the univariate case without deterministic 
terms so p = dim A = 1 and dimD = 0, in which case V and D trivially 



have no common eigenvalues. First, Potscher ([2lJ, Theorem 3.1) shows an under- 
estimation result for rates satisfying /(T)/logT — > oo in the non-explosive case 
so |A(B)| < 1, hence dimW = 0, but with Assumption (|2.3|) replaced by the 
weaker condition that liminfj^oo Y^t=i E(£ 2 |-7"t-i) 0. Potcher's Theorem 
3.2 concerning under-estimation in the purely explosive case so |A(B)| > 1 requires 
liminf T ^oo /(T)/T > a.s. with just A > 2 in Assumption fSTTJ. The Remark 1 to 
his Theorem 3.3 points out that his results do not provide a common feasibility rate 
for autoregressions with both explosive and non-explosive roots in that f(T) = o(T) 
is required for the over-estimation result, whereas liminfr^oo f(T)/T > a.s. is 
required for the under-estimation results. Secondly, Theorem 3.6 of Wei [25[ goes a 
step further in showing the over- estimation result for the rate f(T) = logT for the 
non-explosive case so dimW = 0. 

The optimal log log T rates in (v) were originally suggested by Hannan and Quinn 
[loj ] and Quinn [22] for the case where |A(B)| < 1, dimD = 0. A full generalisa- 
tion cannot be made at present as the proof hinges on proving that the smallest 
eigenvalue of the average of the squared residual from regressing V±-i on D t , that 
is T~ 1 ~' ? 5Z (= i(Vt_i \D t )(Vt-i \D t )' , has positive limit points for some r\ > 0. This 
result can only be established in two special cases: first, if dimV = the issue 
is irrelevant, and secondly, if V = 1 and dimD = this follows from the law of 
iterated logarithms by Donsker and Varadhan A more detailed discussion is 
given in Lemma l3.5f iv) in the Appendix. 

The strong log log T rate in Theorem I2.5f iv) and Remark I2.6f iv) has previ- 
ously been established in the purely stable, univariate case without determin- 
istic terms, so p = dimX = 1 and dimD = and |A(B)| < 1, and hence 
dimW = 0, see Potscher ([ll[, Theorem 3.4). Once again, his result only requires 



limim/r^oo T 1 X^t=i E( e i |«^t— l) — * a.s. instead of Assumption 12.3 



2-4- Residual based mis- specification testing 

The third approach is to fit the model (|1.1[) with k — 1 lags and analyse the residuals 
for autocorrelation of order up to m. The maximal lag length parameter K is here 
required to be at least k — 1. This is done in two steps. First the residuals it are 
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found for the regression (jl.ljl with t = 1, . . . , T and k — 1 lags. In the second step 
it is analysed in an auxiliary regression for t = m + 1 , . . . , T, where it is regressed 
on it-i, . . . m as well as the original regressors X t _i = . . . ,X' t _ k+l )' 

and Dt. The original regressors are included to mimic the above likelihood analysis 
where X t _i, D t are partialled out from X t and X t ~k- A test based on the squared 
sample correlation of the variables in the auxiliary regression is asymptotically 
equivalent to the likelihood ratio tests, so the degrees of freedom do not include the 
dimension of X t _i, D t . In the multivariate case, p > 1, the test can be implemented 
in three ways, using either a simultaneous test, a marginal test or a conditional test. 

The joint test, is based on the test statistic tr(Ti? 2 ), where R 2 is the squared 
sample multiple correlation of it and (i' t _ 1 , . . . , i' t _ m ,'K' t _ 1 , D' t )' . 

The other two tests are based on a q-dimensional subset of the p components of 
Ef As the equations in the model equation (jl.ip can be permuted there is no loss 
of generality in focussing on the first q components. Thus, partition 

*-(%)■ 

where e±,i and X t ^\ are g-dimensional. 

The marginal model consists of the first q equations of (|1.1[) . that is X t} i given 
X{_i, D t . The marginal test is then based on the squared sample multiple correla- 
tion -Rmarg say, of e t)1 and (i' t _ 1A , . . . ,i' t _ m l ,X' t _ l7 D' t ). 

The conditional model consists of the first q equations of (|1.1|) given X t .2, that 
is X t .i given X t ^, X t _i, D t . The conditional test is based on the squared sample 
multiple correlation, B% ond say, of e M and (i' t _ ltl , i' t - m ,i, X' t2 , X' t _ 1 , D' t ). 

The following asymptotic result can be established. 

Theorem 2.7. Suppose Assumptions $2.1$ . <\2.2\ . §2.5\ are satisfied and fco < 
k. Then tr(TR 2 ) is asymptotically x 2 (p 2 ' m ), while tr(T 'R 2 narg ) and to(TR 2 ond ) are 
asymptotically x 2 (g 2 m). 

Sometimes these test are implemented so that the auxiliary regression is carried 
out for t = 1, . . . , T rather than t = m+1, . . . , T with the convention that eo — ■ • ' — 
£l— m = 0. Variants of the tests have been considered, in particular for the univariate 
case, by Durbin Q, Godfrey ||, Breusch Q and Pagan Those variants have 
been argued to be score/Lagrange multiplier type tests and asymptotic theory has 
been established for the stationary case |A(B)| < 1. 

3. Proofs 

The likelihood ratio test statistic for testing — is given by 

LR(A k = 0) = -TlogdetCO^Qfe) 
(3.1) = -Tlogdct{/ p - ^^(Ofc-i - A fc )}, 

where Cl k and &k-i represent the unrestricted and restricted maximum likelihood 
estimators for the variance matrix defined below. In the following first some notation 
is introduced. Then comes an asymptotic analysis of Ofc_x and tlk—i — &k and finally 
proofs of the main theorems follow. 
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3.1. Notation 

It is convenient to introduce some notation to handle flk-i as well as fi&_i — Qk- 
Thus, let the residuals from the partial regressions of X t and X t -k on X t _i = 
(X t '_ 1; . . . , X' t _ k+l )' and the deterministic components A be denoted 

(X t |X t _i,A), (X t _ fc |X t _i,D t ). 

When the hypothesis, Au = 0, is satisfied then (X t \X t -i, D t ) = (et|X t _i, A) and 
therefore the restricted variance estimator is given by 

1 T 

(3.2) = - NXt-i, A) (e t |X t _i, A)' • 

t=i 

Most of the analysis in the proof relates to f2fc_i — (lk so it is helpful to define 

T I T \ _1 T 

Q (Z t ) = £ e t Z[ ZtZ' t Y, Z ^ 
t=i \t=i / t=i 

for any time series Z t . It follows that T(Cl/._i — f^) = Q(X t _ / t|X t _ 1 , D t ). Occa- 
sionally the following notation will be used: For a matrix a let a® 2 — act'. 

3.2. Asymptotic analysis of Clk—i 

Asymptotic expressions for the restricted least squares variance estimator Cl^-i are 
given by Nielsen ([13]) Corollary 2.6, Theorem 2.8): 

Lemma 3.1. Suppose A^ — and that the Assumptions §2.1} . §2.3} . §2.5} are 

satisfied with A > 2. Then, for all £ < 1 — 2/ A it holds 

Clk~i a =^Ye t e>+o(T-% 
t=i 

If in addition Assumption §2. 2} is satisfied then for all C < min(£, 1/2) it holds 

3.3. Asymptotic analysis of Clk—i — ^fc 

The analysis of the term £lk-i — ^fc is specific to the order selection problem. For 
the sake of finding the asymptotic distribution of the likelihood ratio test statistic 
the aim is to express Clk-i — &k in terms of a stationary process Y t as 

(3.3) T(ft fc _i-ft fc ) =Q(X t _ fc |X t _ 1 ,A) = Q(n-i)+o P (l) ! 

which in turn can be proved to be asymptotically \ 2 by a Central Limit Theorem. 
The result (|3.3|) reduces trivially to an equality with Yt-\ — £t-i when testing 
A\ = 0, so only the case k > 1 will need consideration in the remainder of this 
subsection. On the way to prove the above result some related expressions holding 
under weaker assumptions emerge which can be used for proving the consistency 
results for the estimator of the lag length, k. 

In the following flk-i — &>k is first decomposed into seven terms. It is then shown 
that the three leading term can be written as Q{Y t -\) as in (|3.3[) and that the 
remaining four terms are asymptotically vanishing. 
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3.3.1. Decomposition of £lk-i — &k 

The first decomposition is a purely algebraic result based on the formula for parti- 
tioned inversion. 

Lemma 3.2. Suppose Ak — 0. Then it holds 

Q (X t _ fc |X t _i,D t ) = Q (X t _ 2 |A) - Q (X t _i|D t ) + Q (e t _i|X t _ 2 , A) • 
Proof of Lemma \3.2\ By the formula for partitioned inversion it holds 



(3.4) Q 



X t _r 
Xt-h 



A =Q(X t _ fc |X t _i,A) + Q(X t _i|A), 



of which T(fife_i — 17^) = Q (Xt_fc|Xt_i, A) is the first term on the left. Noting 
that (X(_ 1; X' t _ k )' — (X(_ 1 ,X(_ 2 ) / a repeated use of the formula for partitioned 
inversion shows 



(3.5) Q 



X t _! 

Xt-k 



A =Q(X t _i|X t _ 2 ,A) + Q(X t _a|A). 



Due to the model equation (II. ip with Ak = and the property A = DA-i it 
follows (Xt_i|Xt_2, A) = (st-i|X{_2, A)- The desired expression then arise by 
rearranging the above expressions. □ 

Asymptotic arguments are now needed. These arguments rely on Nielsen 11711 
which in turn represents a generalisation of the arguments of Lai and Wei [151 ]. 
The second step is therefore an asymptotic decomposition of the first two terms in 
Lemma \3 . 2 1 using that the processes Ut,Vt, Wt are asymptotically uncorrelated. 



Lemma 3.3. Suppose Ak — and that the Assumptions f\2.3ty . *\2.5\ are 

satisfied with A > 2. Then, for j = 1, 2, 

(3.6) Q (Xt-,,1 A) =• Q (U t -AD t ) + Q {V t -j\D t ) + Q (W t -AD t ) + o (1) . 

Proof of Lemma Wlh Since MX ( = (U t ,V t , Wt), see (|2.7I) . it suffices to argue that 
the processes Ut, Vt and Wt are asymptotically uncorrelated so that the off-diagonal 
elements of V^—j (Xf_,-| A)(X<j_,-|A) / can be ignored in the asymptotic argument. 
This follows from Nielsen ([17J, Theorem 6.4, 9.1, 9.2, 9.4), see also the summary 
in Table 2 of that paper. □ 



3.3.2. Eliminating explosive terms and regressors in stationary terms 
In combination Lemmas 13.21 13.31 show that 

T(Ci k -i-Ci k ) =' Q (e t _i|X t _ 2) A) + Q (C/*- 2 |A) - Q (Ui-i|A) 
+Q (K-alA) - Q (Fi-i|A) + Q (Wi_ 2 |A) - Q (W*_i|A) + o (l) . 

Under mild conditions this can be reduced further so as to eliminate the terms 
involving the explosive component Wt as well as the regressors in the terms involving 
the stationary component U t . 
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Lemma 3.4. Suppose Ak = and that the Assumptions <\2.1\i . I\2.3ty . <\2.5\i are 

satisfied, with A > 2. Then, 

(3.7) T(^ fc _! - =' Q (e t -i) + Q(A- 2 ) - Q0t-i) + R £ + Rv+o{l), 
where 

(3.8) R e = Q(e t -i\-X t -2,D t )-Q(£ t - 1 ), R v = Q (Vt_ 2 |A) - Q (V t -i\D t ) . 
Proof of Lemma \3m It suffices to prove, for j = 1,2, 

(3.9) Q (U t -j\D t ) =• Q(U t -j) + o (1) , 

(3.10) Q (Wt-a\D t ) - Q (W t -i\D t ) =• o (1) . 

First, consider ([3J]) . Because of (|2T8|) then (A-ilA) = (A-jl-A)- According to 
Nielsen ([l3|i Theorem 6.4) it holds for any r\ > that 

E^-i E' - ' J °= °r<- 1/2 ), 
\*=i / t=i \t=i j 

while Theorem 6.2 of the above paper shows T~ l Y^t=i ^t-jUt-j nas positive defi- 
nite limit points. This implies 

T T 

Theorem 2.4 of the above paper shows Y%=i £ tD' t (J^ =1 DtD'A^ 1 / 2 = o(T") imply- 
ing 

5>(A-ilA)' °=" Eet^. +o(T 2 "). 
t=i t=i 

That theorem also shows Y^t=i e t^t-i(Z)^=i Ut-jU' t -j)~ l l 2 = o(T n ). In combina- 
tion these results show the desired result. 

Secondly, consider ()3.10p . Note first that Wt-i — WW t _2 + MwA-i + ^w,t-i 
by Q3J while A-i - D- X A, implying (W t _i|A) = (WW t _ 2 + e ff , t -i | A). This 
gives rise to the expansions 



53 (Wt-rlAf 2 = ^ (WW t . 2 \D t f 2 (1 + / T ) , 
t=i t=i 

T T 

53 (Wt-i|A) e* = E CWWi_ 2 | A) £t + cr, 
where fr — 0{d T 1 ^ 2 ar) + dHj^br and 

T T 

a T = d~ 1/2 Y, (WW t _ 2 |A) ew,t-i, &t = (ew,t-l|A) ? 

t=i t=i 

T T 

c T = J2( e w,t-i\D t )e u rf T = 53(WWf_ 2 |£> 



■552 
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Using Nielsen ([13], Theorems 2.4, 6.2, 6.4) it is seen that 

b T a = O(T), c T a =' o^ 1 / 2 ^'). 

It follows from Nielsen ([13], Theorems 2.4, 9.1 and Corollary 7.2) that 

Q(W t -j\D t ) a ^o(T), flT a = s -o(T 1 /2 ); d T i^-o(p- T ), 

for some p > 0. This implies that /t is exponentially decreasing. The desired result 
follows by expanding Q(W t -i\D t ) in terms of Q(W t ^ 2 \D t ) as 

Q (Wt_ 2 |A) + d T 1/2 c T {g (W t _ 2 |A)} 1/2 + c^or] (1 + fx) , 
and using the established orders of magnitude. □ 

3.3.3. Eliminating unit root terms and regressors in innovation terms 

The terms Rv and i? e defined in (|3.8[) are now shown to vanish asymptotically. At 
first, consider Rv defined in (|3 . 8[> . which consists of the terms involving the unit 
root components V*. Several results are given, of which the strongest result for Rv 
can only be established for certain values of the parameters. 

Lemma 3.5. Suppose Af. = and that the Assumptions \2.1\i . f\2.3ty . \2. 5\ are 
satisfied with A > 2. Then 

(i) R V =• O(logT), 

(ii) Rv — op(l) if also Assumption \2.2\ holds, 

(iii) Rv a ==' 0{(loglogT) 1 / 2 (logT) 1 / 2 } if (A) D and V have no common eigen- 
values, 

(iv) R v a = o(l) if (B) dimD = and V = 1, 

(v) i? y = i/(C) dimV = 0. 

Proof of Lemma \3.5\ (i) This follows since Q(Vt-j\D t ) a =' O(logT) according to 
Nielsen ([13], Theorem 2.4). 

(ii) The type of argument for (|3.10p in the proof of Lemma 13.41 can be used. 
Replacing W with V throughout, the asymptotic properties of aT,br,CT,dT have 
to be explored. For bx, ct the argument is the same so, for all r\ > 0, 

b T "= 0(T), c T =■ o(T 1 / a+, »), 



whereas using Nielsen ([Uj, Theorems 2.4) for and the techniques of Chan and 



Wei @ for dr shows, for all r\ > 0, 

a T =' o (T>) , d^ 1 = opCT- 1 - 4 "), 

so / T = o P (T- 4?? ). Since Q(V t -j\D t ) a =' O(logT) as established in (i) the desired 
result follows by expanding Q(Vf_i|i?t) in terms of Q(T4_2|-Dt). 

(iii) Define the vector S t -i = (Vf_i,D' t )' . By partitioned inversion it holds 

Q(St-i) = Q(Vt-i\Dt) + Q(Dt). 
By an invariance argument D t can be replaced by Dt-j and thus it follows 
Rv = Q 04-2 1 A) ~ Q (Vt-i|A) = Q (5 t _ 2 ) - Q (S t -i) ■ 
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Due to (|2.4p and (|2.7p the process St-i satisfies St — SSt-i + es,t for a matrix S 
with eigenvalues of length one and est = (e' vt ,0'y. It then follows that 

T T 

^ £t 5U = E £ * (S't-2^' + e' s ,t-i) ■ 
t—i t=i 

Inserting this expression into Q(St-i) shows 

T / T \ _1 T 

q (s t -!) = e £ t s ' t -i E s ?-i E = Qa+Qb+q c + q'c, 
*=i \t=i / t=i 

where 

= Q1Q2Q1, Qb — Q4Q3Q3Q4, Qc — Q1Q2 

are defined in terms of the statistics 
t / T 

Qi = 5> e '*t-i. Q 2 = E^- 2 i 

t=l \t=l 

(y \ I/ 2 / T \ — -^/ 2 T / T \ ^V 2 

E*f- 2 2 s e^- 2 i . Q4 = E £ ^'- 2 E^ ■ 
t=l / \t=l / 4=1 \t=l / 

The orders of magnitude of these follow from a series of results in Nielsen [171 ] . 
Theorem 6.1 and Lemma 6.3 imply Q\ =' OKTloglogT) 1 / 2 }. Theorem 8.3 shows 
Q 2 a = 0(T- 1 ) when D and V have no common eigenvalues. Lemma 8.7(H) shows 
Q®2 _ j 0{T- 1 / 2 (logT) 1 /2}. Theorem 2.4 shows Q 4 =' O^logl 1 ) 1 / 2 }. Noting 
that Q(St-2) — QaQ'a this in turn implies 

Q A = O(loglogT), Q B = Q(S t - 2 ) + OiT-'/^logTf/ 2 }, 
Qc^OKloglogT^^tegT) 1 / 2 }, 

and the desired result follows. 

(iv) Donsker and Varadhan's Q Law of the Iterated Logarithm for the integrated 
squared Brownian motion states 

klglogT f T Bldu a,.l_ 

Now use either the argument in (ii) with d^ 1 = 0(T~ 2 log log T) or the argument 
in (iii) with Q 2 a =0(T- 2 loglogT) so Qa,Qb,Qc are all o(l). 

(v) This follows by construction. □ 

Now, consider R £ defined in (|3.8p . By showing that this vanishes it follows that 
the regressors can be excluded asymptotically in the term involving the lagged 
innovations St—i- A fourth order moment condition is now needed in Assumption 

(E3D. 

Lemma 3.6. Suppose Ak — and that the Assumptions (jg.jp . (|l?.3[) . (fJTJ)) are 

satisfied, now with A > 4. Then 

R E = Q (et-ilXt-a, A) - Q (et-i) °= o (1) . 
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Proof of Lemma \3.6\ Define the vector St = (Xj_ 2 ,Z3j)'. According to Nielsen 
([13] j Theorem 2.4) it holds that, for any r\ > 0, the terms 

(rp \ — 1/2 rp / T \ T 

Y^StS'A J2s t e' t , I>4-x 
t=x / t=l \t=l / t=l 

are o(T 1 / 4 ^ 1 ') when indeed A > 4. It then holds that 

f>u-X>^ (l>^ E^-i a = e^u+o^-,), 
t=i t=i \t=i / t=i t=i 

E^xs^-E^ (f>sA f>4-i =Ee t -ie' t -i +0^-"), 

t=X t=X \t=l / t=l t=l 

where the requirement A > 4 is only needed in the first case. Theorems 2.5, 6.1 of 
the above paper show T~ x Y^t=i £ t-i £ 't-i has positive definite limit points while 
J2t=i £ t £ 't-i(J2t=i £ t-i £ 't-i)~ 1 ^ 2 = o(T n ). Combine these results. □ 

Remark 3.7. In Lemma 13.61 a fourth moment condition comes in through the 
requirement that A > 4 in Assumption (|2.ip . This can be relaxed to A > 2 under 
one of two alternative assumptions. 

(I,a) If dimW = then the terms in (|3~TTj) are o(T''), see Nielsen ([l7j], Theo- 
rem 2.4), and the main result holds. 
(I,b) If dim W > but the innovations et are independently, identically distributed 

then terms of the type Et=i Wt—i W^'-x) -1 ^ 2 St=x W*-i £ t converge in dis- 
tribution, see Anderson [l| and the result of the Theorem holds, albeit only 
in probability. 



3.3.4- The leading term of £lk~i — f^fc 



First the order of magnitude the leading term in (|3.7p is established in an almost 
sure sense. This can be done under weak moment conditions. Subsequently the 
distribution of the leading term is investigated. 

Lemma 3.8. Suppose Ak = and that the Assumptions \2.1^ . ^2.3^ are satisfied 
with A > 2. Define Et — Y%=\ £ t £ 'v Then 

limsup T ^ 00 (21oglogT)- 1 tr[{Q( £t _i) + Q{fi t - 2 ) - Q(L> t _ 1 )}^ T 1 ] =' 0(1). 
Proof of Lemma \3.8[ This follows by noting that the sequence d^-i * s relatively 



compact with positive definite limiting points due to Lemma 13.11 and Lai and Wei 

( 



15 1 , Theorem 2) and otherwise following the argument in the proof of Potscher 
2l|, Theorem 3.4). □ 



When it comes to analysing the distribution of the leading term in (|3.7p it is 
convenient to show that it can be written as a single quadratic form Q{Yt-i) for 
some process Yt—x- This argument requires two steps, of which the first is concerned 
with the convergence properties of T" 1 Y^t-i ^t-i^t-x- As the argument involves 
a variance matrix, the Assumption (|2.2[) is now called upon. 
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Lemma 3.9. Suppose Ak = and that the Assumptions \2. \2.2l are satisfied 
with A > 2. Let Mjj be the matrix defined by ejj,t = MjjSt in P^T} an d define 



F = Y^ U t M u CLMlj(U t y. 



t=o 



Then for all £ < min(l — 2/ A, 1/2) it holds 



Proof of Lemma \3.(A Following the proof of Lai and Wei ([15j. Theorem 2), the 
equation (|2.8p shows 

T / T \ 

53 U t U' t a = U K3 LW t ' - tf T E/£, + f7o?7o ) U' 
t=i \t=i / 

T / T \ 

+M C/ ^ et 4M^ + O E U t -ie' t . 
t=i \t=i / 

Due to Nielsen ([l7j]. Theorems 2.4, 5.1, Example 6.5) both Y%=1 ^-i £ t and ^t^t 
are o(T 1_< »). Note that Assumption (|2.5p is not needed as t/ t does not involve 
deterministic terms. Denoting Ft — T~ l *^2,._ x U t fj' t it follows from Lemma 13.11 
that 



F T - UF T U' 



MutlMij +o(T~ c ). 



This equation has a unique solution Ft = ^{MuEM'jj + o(T-«)}(U*) / , 

see Anderson and Moore ([2j, p. 336), which in turn equals F + o(T~'=) since the 
maximal eigenvalue of UU' is less than one. □ 

The leading term in (|3.7p is now written as a single quadratic form Q(Y t -i). 

Lemma 3.10. Suppose Ak — and that the Assumptions \2. \2. 2\ are satisfied 
with A > 2. Then there exists an {(p + dimU) x p} -matrix C with full column rank 
so 

Q (e t _i) + Q(f/ t _ 2 ) - Q{Ut-i) =" Q (Ft-i) + o (1) , 
where Y t is the process C'(s' t , J7/_i)' '. 

Proof of Lemma \3.1(A The idea is to exploit that the asymptotic covariance for 
= (J7 t '_ 2 , e't-i)' is diagonal with elements F, i7. By the above Lemmas 13.1113.91 
then, for some r\ > 0, 



(3.12) 



! ( Et _i) + Q(lV 2 ) 

T 



C/t-2 



E 

,t=i 



t-2 







1 T 



. . B.-2 



F 
fi 



et-i4-i 

-1 T 



E 



C/t-2 
£i-l 



E 

t=i 



C/ t -2 



£ ;{i + o(t-")} 



As discussed in Section [2] then Ut-i = UUt-2 + Mu^t-i for some matrix Mjj with 
full column rank. In particular Ut-i — C'_ L (U' t _ 2 ,s' t _ 1 )' where the {(p + dimJ7) x 
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dim £/}-matrix C± = (U, Mr/)' has full column rank. Therefore a {(p + dim U) xj>}- 
matrix C can be chosen with full column rank so the matrix (C, C±) is regular and 



The sequences T^ 1 Y^t=i Ut-iUt-i an d ^ X S^Li Ut-2U' t _ 2 will have the same 
limit, F, while T- 1 J2 t= i Y t-i Y t-i wil1 

converge to a positive definite matrix G. 

It then holds 

a) (on) = (og 

Pre- and post-multiplying the middle matrix in (|3.12p with (C±, C)(C±, C)^ 1 and 
its transpose then implies 

Q (e t -i) + Q(Ut-2) =■ {Q(Ut-i) + Q (Yt-i)} {1 + o(T-")}. 

Theorem 2.4 of Nielsen (2005) implies Q(Ut-i) and Q(y t _i) are o(T''), which gives 
the desired result. □ 

The asymptotic distribution of the leading term Q{Yt-\) now follows. 



Lemma 3.11. Suppose Ak — and that the Assumptions \2J$ , \2.%& , are 
satisfied with A > 4. Then 

(i) 1 < hmsup T ^ 00 (21oglogT)- 1 tr{Q(y t _i)n- 1 } < P 2 a.s. 

(ii) tr{Q(y t _ 1 )n- 1 } ^ x 2 (p 2 ). 

Proof of Lemma \3.11\ (i) This follows from the Law of Iterated Logarithms by 
Heyde and Scott ([12], Corollary 2) and Hannan ([9], p. 1076-1077). See Quinn [|| 
for details. 

(ii) This follows from Brown and Eagleson's [H Central Limit Theorem. This 
requires existence of second moments of StYt-x- D 

Remark 3.12. The proof of Lemma [3.111 actually only requires the existence of 
fourth moments, which is slightly weaker than the stated condition of A > 4 in 
Assumption (|2.ip . In Lemma I3.11f ii) this can be relaxed to a second moment con- 
dition if for instance: 
(II) the innovations e% are independent. 



3-4- Proofs of results for likelihood ratio test statistics 

Proof of Theorem \2.1\ Consider the formula (|3.ip . The term flk-i was dealt with 
in Lemma 13. 11 As for the term T(i7fc_i — £1%) consider two cases. 
When k = 1 then T(n fc _i - Cl k ) = Q(e t -i). 

When k > 1 apply the expansion in Lemma 13.41 The term Ry vanishes due to 
Lemma I3.5f ii) when Assumption (|2.2|) is satisfied. The term R £ vanishes due to 
Lemma [3761 when A > 4 in Assumption (12. 1| . Due to Lemma T3. 101 the leading term 
is now Q(lt_i), provided Assumption (|2.2p holds. 

For any k the desired x 2 -distribution now arises from Lemma |3.1 If ii) provided 
Assumptions (|2.2p . (|2.ip are satisfied with A > 4. □ 
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Proof of Theorem \2.3[ Note first that T(f2fc_i — Clk+m-i) can be written as 
Q(X*I%~ m+1 |X t _i , A) where = (X t '_„, . . .,X[_ b )'. Consider now the proof 

of the decomposition in Lemma [3.21 Using first (13. 4| and then (13. 5|) repeatedly it 
is seen that 



T (Clk-l — &k+m-ij — Q 



X t _r 

vt — k— m+1 



A -Q(X t _r|A) 



— J]] Q ( £ t-j I -^t-j-l' -^t-m-li A) 
i=i 

+ <3(x t _ TO _ 1 | A) -Q(x t _i| A). 



As in the proof of Theorem 12. li the Lemmas 13. 4i X&.b\ ii), 13.61 show that the leading 
terms reduce to 

m 

T (0 fe _ x - ^ fc+m _!) = £ Q (st-j) + Q (lV m -i) - Q (Ut-i) + o P (1) , 
i=i 

when fco < fc. A slight generalisation of Lemma 13.101 is needed, using that the 
asymptotic covariance for Z t -i = (C/ t '_ m _ 1 , £ r t -v ■ ■ ■ > £ t-m)' * s diagonal with ele- 
ments F, f2, . . . , n. A {(mp + dim £/) x mp}-matrix C can then be found giving rise 
to a process Yt_i = C Z^\. The argument is completed using a Central Limit 
Theorem as in the proof of Lemma ^-IIT m). □ 

3. 5. Proofs of results for information criteria 

Proof of Theorem \2.4\ Consider j < fco- The condition f(T) = o(T) implies 
- $ fe0 = logdet{/ + (fy - n ko )Ci^} + o (1) . 

Lemma [3. II shows that Q& -4* £1, so it suffices that liminf A max (Aj— O^ ) is positive. 



Defining Y t =(X t '_ ls . . . , X t '_ i+1 )' and Z t = {X' t _ p X' t _ k J it holds 
T- 1 / 2 ^X t {Z t - l \Y t . l ,D t )'[f^{7 lt ^\Y t . l ,D t f 2 \ 

t=l [t=l ) 



-1/21 ® 2 



Define A y = A\ , . . . , Aj and A z = Aj+i, . . . , Afc noting that Ak ^ 0. Then it 
holds X t = A y Y t + A z Z t + fj,D t + e t . Therefore (lj — (lk equals 



T ( T 



-1/2 



T-^J^et (Z t _i|Y t _i, A)' < X! ( z t-i|Y t -i, A) 

*=i L*=i 

v 1/2 



T 

(82 

't) 



+AJr- 1 £(Z t _ 1 |Y t _ 1) A 

I t=l J 

The first term is of order o(l) a.s. by Nielsen ([13]; Theorem 2.4). As for the second 
term it holds that lim inf t->oc A m i n { T_1 Et=i(X 4 -i|A)® 2 } > a.s. according to 
Nielsen ([I3]> Corollary 9.5). As a consequence the limit points of T _1 Y^t=i ( z *-i| 
Y f _i, A)® 2 are positive definite. Since A z ^ then lim inf 7^00 A m i n (f2j — fi& ) > 
and therefore lim infr—^ k > k a.s. □ 
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Proof of Theorem \2.5[ Consider now fco < j < K. It then holds 
- $, = logdet^+ifiT 1 ) + T- 1 / (T) 

= logdet{I p - {Qj - Q^Qj 1 } + T- X f (T) . 
A Taylor expansion shows 

- ^ =' - ^• +1 )07 1 } + T- 1 / (T) + o[{(^ - fVujftj 1 } 2 ]. 

Lemma 13-11 shows that fij is consistent, while Lemma 13.41 gives the expansion 

T(^_! - %) =' Q (e t _i) + Q(& t _ 2 ) - + i? £ + Rv + o (1) . 

To complete the proof it has to be shown that Qj+i — $j has a positive limiting value. 
This holds if T(fi J -_ 1 - %) = o{.g(T)} for some function p(T) so f(T)/g(T) -> oo. 

(i) The term i?y vanishes due to Lemma I3.5f ii) when Assumption (|2.2p is 
satisfied. The term R e vanishes due to Lemma 13 . 6 1 when A > 4 in Assumption (|2.ip . 
Due to Lemma T3. 101 the leading term is Q(Y t -i), provided Assumption (|2.2[) holds. 
This is Op(l) by Lemma l3.11f M) provided Assumptions (|2.ip , (|2.2p are satisfied 
with A > 4. 

(ii) The term Ry is O(logT) due to Lemma [3.5f i). The term R £ vanishes 
due to Lemma I3~51 when A > 4 in Assumption (|2.ip . Due to Lemma [3~51 the leading 
term is O (log log T). 

(iii) Under (A) that V and D have no common eigenvalues then Ry is 
0{(logT) 1 / 2 (loglogT) 1 / 2 } due to Lemma f3.5( m). The argument of (ii) can then 
be followed. 

(iv) Under (B) that dimD = with V = 1 then Ry is o(l) due to Lemma 
\3.bU v). whereas under (C) that dimV = then Ry — 0. while it is o(l) under (B) 
dimD = with V = 1 due to Lemma \?>.5\ iv). The argument of (ii) can then be 
followed. 

(v) The terms Ry and R e vanish as in (iv). As in (i) the leading term is 
Q(Y t _i) by Lemma [3.101 provided Assumption (12. 2[) holds. This is of the desired 
order of magnitude by Lemma r3.11f i') provided Assumptions (|2.2p . (|2.ip are satisfied 
with A > 4. □ 

Remark 3.13. The condition A > 4 in Theorem 12.51 can be relaxed as follows. 

(i) It is used first in Lemma 13.61 and can be relaxed under (I, a) or (I,b) as 
this is a result holding in probability, see Remark 13. 71 It is used secondly in Lemma 
I3.11f ii) and can be relaxed under (II), see Remark 13.121 

(ii) , (iii), (iv) It is only used in Lemma [3.61 and can only be relaxed under 
(I, a) as this is a result holding almost surely, see Remark 13.71 

(v) It is indeed required in Lemma [3.1 If i). 

3. 6. Proof of results for residual based tests 

Proof. It suffices to show how the residual based test statistics relate to the likeli- 
hood ratio test statistics. 

In the joint test the squared sample multiple correlation R 2 of i t and the vector 
Zt-i = (e't-i, ■ ■ ■ D' t )' is considered, recalling that X t _i is defined as 

(X' t _ x , . . . , X' t _ k+1 )' . The key to the result is that 

it-j = *t-j - BX t _,-_a - flD-^Dt, 
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where B,/t are least squares estimators based on for the full sample t = 

1, . . . ,T. Due to the inclusion of Xj_i as regressor it follows that Z t -\ = NZ t -i 
where Zt—i = . . . , Xt—k—m+i> Dt) and the square matrix N is based on 

B,// and is invertible with probability one. By the invariance of sample multiple 
correlations to linear transformations then R 2 can be computed from i t and Zt-%. 
By the same type of manipulation as in Lemma 13.21 it follows that 

q (zt-i) = E £tz' t -i ( E E ^ 

t=m+l \i=m+l / t=m+l 

can be written as 

(3.13) O(^t-i) =Q(X t _ fc ,... ) X t _ fc _ m+ i|X t _i,D t ) + Q(X t _i,D t ). 

Since the first term in (|3 . 1 3[) includes the regressors X t _i,Z) t then i t can be 
replaced by e t . Thus, apart from starting the regression at t = m + 1 instead of 
t = 1 this term is the same as Q(X t -k, ■ ■ ■ , Xt-k-m+i |Xj_i, D t ). It therefore has 
the same asymptotic properties as T(Clk-i — Ofc+m-i)> which was studied in the 
proof of Theorem 12.31 

The second term in (|3.13[) vanishes asymptotically. This is because the residuals 
it are orthogonal to Xt_i, D t when evaluated over t — 1, . . . , T. A tedious analysis 
shows that this orthogonality holds asymptotically when evaluated over t = m + 
1 T 

For the marginal test the argument is the same. The main difference is that the 
residuals are now 

&t — j,marg — Xf,— j^l B mar gXi_j_i /} mar gD ^ -^i- 

Once again the inclusion of X t _i as regressor implies that the vector Zt-\,m&Tg 
defined as {e' t _ x ,... ,i' t _ m ,'K' t _ 1 , D' t )' can be replaced by the above Z t _i. So the 
statistic Q(Zt—i) is replaced by a statistic based on e^marg, but the same Zt—\. 
For the conditional test the residuals are of the type 

£t-j,cond = Xt-j t l — B con dX t _j_i — /i conc iD 3 1 D t — U)Xt-j t 2. 

The same argument applies as for the marginal test. □ 
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